Syntactic Disambiguation by Learning Weighted Government Patterns from a Large Corpus
نویسندگان
چکیده
A method of syntactic disambiguation based on proper prepositional phrase attachment, or, more generally, attachment of the clauses in specific grammatical cases, is described. The research was based on Spanish and Russian material. The data set built and used by the procedure is a kind of a syntactic government patterns dictionary. The algorithm requires a morphological and a syntactic parser and assigns probability weights to the variants built by the parser. No manual markup is required. At the training stage, the procedure works iteratively on a large text corpus, in alternating steps re-estimating the frequencies of individual government patterns and then the weights of the variants. The method is compatible with other methods of estimation of the variants. The data set built by the algorithm is useful for compilation of a combinatory dictionary for human readers. Some generalizations of the method are discussed.
منابع مشابه
Identifying Syntactic Role of Antecedent in Korean Relative Clause Using Corpus and Thesaurus Information
This paper describes an approach to identifying the syntactic role of an antecedent in a Korean relative clause, which is essential to structural disambiguation and semantic analysis. In a learning phase, linguistic knowledge such as conceptual co-occurrence patterns and syntactic role distribution of antecedents is extracted from a large-scale corpus. Then, in an application phase, the extract...
متن کاملExtracting hypernym relations from Wikipedia disambiguation pages : comparing symbolic and machine learning approaches
Extracting hypernym relations from text is one of the key steps in the construction and enrichment of semantic resources. Several methods have been exploited in a variety of propositions in the literature. However, the strengths of each approach on a same corpus are still poorly identified in order to better take advantage of their complementarity. In this paper, we study how complementary two ...
متن کاملUnsupervised Learning of Syntactic Knowledge: Methods and Measures
Supervised methods for ambiguity resolution learn in "sterile" environments, in absence of syntactic noise. However, in many language engineering applications manually tagged corpora are not available nor easily implemented. On the other side, the "exportability" of disambiguation cues acquired from a given, noise-free, domain (e.g. the Wall Street Journal) to other domains is not obvious. Unsu...
متن کاملIntensive Use of Lexicon and Corpus for WSD
The paper addresses the issue of how to use linguistic information in Word Sense Disambiguation (WSD). We introduce a knowledge-driven and unsupervised WSD method that requires only a large corpus previously tagged with POS and very little grammatical knowledge. The WSD process is performed taking into account the syntactic patterns in which the ambiguous occurrence appears, relaying in the hyp...
متن کاملDeveloping a syntactic analyser for Estonian
The aim of the present article is to give an overview of the current state of syntactic analysis of Estonian and describe problems that were encountered in the generation of syntactic rules for the syntactic analyser of Estonian. So far only the rules based on linguistics have been used. This article is focused on the statistical methods in syntactic analysis and it describes the experiments of...
متن کامل